7 research outputs found

    Constraint-based Sequential Pattern Mining with Decision Diagrams

    Full text link
    Constrained sequential pattern mining aims at identifying frequent patterns on a sequential database of items while observing constraints defined over the item attributes. We introduce novel techniques for constraint-based sequential pattern mining that rely on a multi-valued decision diagram representation of the database. Specifically, our representation can accommodate multiple item attributes and various constraint types, including a number of non-monotone constraints. To evaluate the applicability of our approach, we develop an MDD-based prefix-projection algorithm and compare its performance against a typical generate-and-check variant, as well as a state-of-the-art constraint-based sequential pattern mining algorithm. Results show that our approach is competitive with or superior to these other methods in terms of scalability and efficiency.Comment: AAAI201

    The Continuous Time Service Network Design Problem

    Get PDF
    The service network design problem (SNDP) addresses the planning of operations for freight transportation carriers. Given a set of requests to transport commodities from specific origins to specific destinations, SNDP determines a continuous movement of vehicles to service demand. Demand becomes available for pick up at its origin by a given availability time and has to be dropped off at its destination by a given delivery deadline. The transportation plan considers matters of vehicle routing, consolidation, service schedule, empty vehicle repositioning, assignment of freight to operating vehicles, and vehicle stops and waiting times. The literature studies a periodic time approach to SNDP. This thesis generalizes the periodic time approach to SNDP by introducing a continuous time network and model. Several network and model reduction techniques are introduced, and a multi-cut Benders decomposition is developed to solve the continuous time model. To improve convergence of Benders decomposition, we strengthen the algorithm with a family of valid inequalities for SNDP. Numerical results show the benefits of the continuous time approach. Substantial reductions in computational effort and improved lower bounds are achieved by the multi-cut Benders decomposition algorithm

    Interpretable Learning and Pattern Mining: Scalable Algorithms and Data-Driven Applications

    No full text
    In recent years, data-driven methodologies have enjoyed great success due in large part to the increasing accessibility of highly accurate machine learning tools. The challenge, however, is that such tools generally offer little interpretability in their learning tasks. This is a significant concern in many applications, such as scenarios with ethical implications or high risk. On the other hand, current interpretable learning algorithms are typically less accurate for prediction tasks, and comparably not scalable to accommodate large size datasets. This motivates the theme of this dissertation, which is to develop interpretable, yet accurate and scalable learning algorithms and apply them to real-life data-driven applications in management science. As our first problem, we investigate the Multiple Sequence Alignment (MSA) problem. Our aim is to learn the optimal alignment between sequences of data. Applications of MSA include bioinformatics, high frequency trading, speech recognition, and computer vision. Although higher qualityalignments offer significantly more insight for practitioners, most MSA algorithms are heuristic, and have been shown to often generate alignments far below the accuracy of an optimal solution on benchmark instances. In fact, only one viable exact algorithm has been developed for MSA, whichis limited to solving small sized instances with five sequences and a total of 600 data entries. Usingtools from dynamic programming, mathematical programming, constraint programming, and decompositiontechniques, we develop a novel exact alignment algorithm capable of aligning up to ten sequences and 1,600 total data entries. Our method is able to close 37 out of 51 real-life benchmark instances to optimality for the first time, and considerably improves the alignment quality on theremaining instances. In the next chapter, we develop novel techniques for constraint-based sequential pattern mining(SPM). SPM is an unsupervised learning algorithm that involves finding frequent patterns in data. Frequent patterns are used, e.g., to extract knowledge from data, and to develop novel association rules. Constraint satisfaction is often critical in the practice of SPM, to prevent overwhelming the practitioner with millions of uninteresting patterns. Unfortunately, the literature accommodates only simple constraints, and is further limited to databases with up to a million data entries. Using novel modelling techniques, we increase the scalability of our SPM algorithm by an order ofmagnitude, and further design and prove constraint-specific information for a number of complex constraints common in practice. Our algorithm is the first to handle complex constraints such as average and median, but is also competitive or more efficient when compared to a state-of-the-art SPM algorithms with simple constraints.We next study data-driven sequential decision making with an emphasis on interpretability and sequential structure in SPM. Using a novel data tree model of the database, we are able to increase the capability of SPM algorithms from databases with ten million to three billion data entries.We leverage data trees to design a pattern mining algorithm capable of extracting novel sequential patterns from large datasets, and design an interpretable knowledge tree equipped with statistical hypothesis tests, to increase reliability in data-driven decision making. Using our approach, we investigate two large-size real-world applications in marketing and finance. In marketing, we consider reducing the skip rate of users in an online music streaming platform. We find that almost all one billion user skips in the database can be explained using an average of 6,400 sequential patterns, with an average likelihood of 83%. In finance, we assess using historical sequential patterns of price change to aid investment decision making in the stock market. We find that, at best, 80% of nine hundred thousand monthly price change events can be explained using approximately 7,000 sequential patterns, with a low average likelihood of 53%. In the final chapter, we study the provider network selection and insurance design problem faced by a major healthcare insurance provider. The provider network consists of physicians and hospitalsthat are under contract by the insurer and offer a wide range of healthcare services to insured patients. The problem involves choosing/changing the physicians and hospitals to contract, and designing insurance plans to target patients under competition with a rival firm. We provide a novelmethodology that incorporates the literature’s interpretable utility approach for patient behavior into a simultaneous multi-column-and-row generation optimization framework. By optimizing the provider network as a whole, we compose higher quality insurance plans that increase the profit ofthe insurer by an average of 548% on test instances, while decreasing the overall patient healthcare costs by 36% and the overall premiums payed by patient by 21%. We showed how interpretable information can be extracted from the optimization framework to aid decision making, and lastlyanalyzed the impact of inaccurate patient predictions on the insurer. Our results show that a more accurate yet interpretable predictor of patient choice can greatly enhance insurance plan design.</div

    The Interaction of Helicobacter pylori Infection and Type 2 Diabetes Mellitus

    No full text
    Helicobacter pylori is one of the most common human pathogens that can cause gastrointestinal (GI) disorders, including simple gastritis, gastric ulcer, and malignant gastritis. In some cases, such as immunodeficiency and underlying diseases, it can be problematic as opportunistic infections. Diabetes mellitus (type 2) (T2DM) is one of the H. pylori underlying diseases. Since GI problems are observed in diabetic patients, it is necessary to treat H. pylori infection. In this review, we aimed to evaluate the possible relationship between H. pylori and T2DM according to epidemiological surveys of 70 studies retrieved from databases, including Scopus, PubMed, and Google Scholar about the relationship between H. pylori and T2DM, and discuss the reported background mechanisms of this correlation. According to the results of our study, the different studies have shown that H. pylori is more prevalent in Type 2 diabetic patients than healthy individuals or nondiabetic patients. The reason is development of H. pylori infection-induced inflammation and production of inflammatory cytokines as well as different hormonal imbalance by this bacterium, which are associated with diabetes mellitus. On the other hand, by tracing anti-H. pylori antibodies in patients with diabetes mellitus and occurrence of symptoms such as digestive problems in >75% of these patients, it can be concluded that there is a relationship between this bacterium and T2DM. Considering the evidence, it is crucially important that the probability of infection with H. pylori is evaluated in patients with T2DM so that medical process of the patient is followed with higher cautious

    Seq2Pat: Sequence-to-Pattern Generation for Constraint-Based Sequential Pattern Mining

    No full text
    Pattern mining is an essential part of knowledge discovery and data analytics. It is a powerful paradigm, especially when combined with constraint reasoning. In this paper, we present Seq2Pat, a constraint-based sequential pattern mining tool with a high-level declarative user interface. The library finds patterns that frequently occur in large sequence databases subject to constraints. We highlight key benefits that are desirable, especially in industrial settings where scalability, explainability, rapid experimentation, reusability, and reproducibility are of great interest. We then showcase an automated feature extraction process powered by Seq2Pat to discover high-level insights and boost downstream machine learning models for customer intent prediction

    Kawasaki disease in children: a retrospective cross-sectional study

    No full text
    Introduction Kawasaki disease (KD) is a systemic vasculitis, seen mostly in children. Epidemiology of KD is dependent on geographical location and seasonality. Although many years have passed since the first report of KD, multiple related factors are still unknown. Material and methods We investigated the clinical, paraclinical, and therapeutic aspects of KD in Kerman, Iran by performing a retrospective, descriptive, cross-sectional study on all children hospitalized due to KD between 2007 and 2020. Results A total of 340 patients with mean ±SD age of 29.83 ±22.55 months participated in the study. Most of our patients were two to five years old. The male : female ratio was ~ 1.4 : 1. A few of our patients had a family history of KD or vasculitis (0.3%, 1.7%). Typical KD was more common by a large margin (316 patients with typical KD). More than half of our patients had a duration of hospitalization of under ten days. All of our patients were febrile. Hand/foot and lip/mouth changes were the second and third most common clinical findings in more than 60% of our patients. Other manifestations were conjunctivitis in 40%, skin rashes in 34.8%, gastrointestinal manifestations in 33.9%, and lymphadenopathy in 25.3%. Echocardiography revealed abnormalities in 78.6% of the participants; coronary artery aneurysm (CAA) was the most frequent (22.5%) and follow-up echocardiography revealed that all of them regressed within 6 months after treatment. The two laboratory tests with the highest ratio of abnormality were erythrocyte sedimentation rate (95%) and hemoglobin (83.3%). C-reactive protein and liver function tests were also abnormal in most patients. All of our patients received intravenous immunoglobulin and acetylsalicylic acid. Conclusions Kawasaki disease must be considered in every febrile child, especially those with risk factors, because timely diagnosis and treatment are essential to prevent complications. Health policies should focus on appropriate diagnosis and treatment to prevent the occurrence of sequelae
    corecore